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DETAILED ACTION 
Response to Arguments 

1 . The previous non-final office action has been withdrawn in favor of a new non- 
final office action. 

Claim Rejections - 35 USC § 103 

2. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

3. Claim 47 is rejected under 35 U.S.C. 103(a) as being unpatentable over 
Bonastre et al. (IEEE Publication) in view of King (US 6532446). 

4. Regarding claim 47, Bonastre et al. disclose a speech recognition processing an 
incoming audio stream containing human speech from a plurality of speakers and 
having at least two speaker models and/or speaker-specific dictionaries, comprising: a 
detector which detects a speaker change in the incoming audio stream (sections 2.1-2.2 
on page 1178 and referring to abstract section); a gather which gathers speaker-specific 
information with corresponding speaker-specific information of at least one 
predetermined known speaker from among the plurality of speakers thus recognizing 
the at least one predetermined speaker (sections 2-3.2, input speech is processed to 
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extract speech features, which are then compared with speech models of each enrolled 
speaker to determine a match). 

Bonastre et al. fail to specifically disclose an interchanger which interchanges 
between the at least two speaker-specific dictionaries dependent on the detected 
speaker change and the corresponding recognized speaker. However, King further 
teaches an interchanger which interchanges between the at least two speaker-specific 
dictionaries dependent on the detected speaker change and the corresponding 
recognized speaker (col. 5, lines 26-47 and col. 6, lines 35-67, user specific files are 
retrieve to process the user's input speech). 

Since the modified Bonastre et al. and King are analogous art because they are 
from the same field of endeavors, it would have been obvious to one of ordinary skill in 
the art at the time of invention to modify Bonastre et al. by incorporating the teaching of 
King in order to provide improve speech recognition accuracy. 

5. Claims 1 9-20, 22-26, 28-31 , 33-39, 41-46, and 48-49 are rejected under 35 
U.S.C. 103(a) as being unpatentable over Bonastre et al. (IEEE Publication) in view of 
Glickman et al. (US 6067059), and further in view of King (US 6532446). 

6. Regarding claims 19, 31, 34, and 48, Bonastre et al. disclose a method, 
apparatus, and a program storage device readable by machine for processing a 
continuous audio stream containing human speech from a plurality of speakers related 
to at least one particular transaction, comprising the steps of: identifying a known 
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speaker from among the plurality of speakers (abstract section page 117); digitizing the 
continuous audio stream (ADC is inherently included in a digital system); detecting a 
speaker change in the digitized audio stream (sections 2.1-2.2 on page 1178 and 
referring to abstract section); performing a speaker recognition if a speaker change is 
detected (section 3 on page 1179). 

Bonastre et al. fail to disclose the step of transcribing at least part of the 
continuous audio stream if a predetermined speaker is recognized, and wherein each 
speaker is processed using a different dictionary of different topics (each enrolled 
speaker has their own models stored in the system before runtime). However, 
Glickman et al. teach the step of transcribing at least part of the continuous audio 
stream if the known speaker is recognized (col. 5, In. 30-67). 

Since Bonastre et al. and Glickman et al. are analogous art because they are 
from the same field of endeavors, it would have been obvious to one of ordinary skill in 
the art at the time of invention to modify Bonastre et al. by incorporating the teaching of 
Glickman et al. in order to provide automatic closed-caption using speaker-dependent 
models to enhance speech recognition accuracy. 

The modified Bonastre et al. fail to specifically disclose that each speaker is 
processed using a different dictionary of different speaker-trained data. However, King 
further teaches that each speaker is processed using a different dictionary of different 
speaker-trained data (col. 5, lines 26-47 and col. 6, lines 35-67, user specific files are 
retrieve to process the user's input speech). 
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Since the modified Bonastre et al. and King are analogous art because they are 
from the same field of endeavors, it would have been obvious to one of ordinary skill in 
the art at the time of invention to modify Bonastre et al. by incorporating the teaching of 
King in order to provide improve speech recognition accuracy. 

7. Regarding claims 25, 35, 39, 43, and 49, Bonastre et al. disclose a method, 
apparatus, and program storage device readable by machine for processing a 
continuous audio stream containing human speech of a plurality of speakers related to 
at least one particular transaction, comprising the steps of: identifying a known speaker 
from among the plurality of speakers (abstract section page 117); digitizing the 
continuous audio stream (ADC is inherently included in digital systems); detecting a 
speaker change in the digitized audio stream (sections 2.1-2.2 on page 1178 and 
referring to abstract section); performing a speaker recognition if a speaker change is 
detected (section 3 on page 1179); and wherein each speaker is processed using a 
different dictionary of different topics (each enrolled speaker has their own models 
stored in the system before runtime). 

Bonastre et al. fail to disclose the step of indexing the audio stream with respect 
to the detected speaker change if the known speaker is recognized. However, 
Glickman et al. teach the step of indexing the audio stream with respect to the detected 
speaker change if the known speaker is recognized (col. 5, In. 30-67, labeling "Bob" or 
"Alice" to transcribed text of corresponding audio segments). 
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Since Bonastre et al. and Glickman et al. are analogous art because they are 
from the same field of endeavors, it would have been obvious to one of ordinary skill in 
the art at the time of invention to modify Bonastre et al. by incorporating the teaching of 
Glickman et al. in order to enable the system to use speaker-specific speech recognition 
models for a particular speaker to improve speech recognition accuracy. 

The modified Bonastre et al. fail to specifically disclose that each speaker is 
processed using a different dictionary of different speaker-trained data. However, King 
further teaches that each speaker is processed using a different dictionary of different 
speaker-trained data (col. 5, lines 26-47 and col. 6, lines 35-67, user specific files are 
retrieve to process the user's input speech). 

Since the modified Bonastre et al. and King are analogous art because they are 
from the same field of endeavors, it would have been obvious to one of ordinary skill in 
the art at the time of invention to modify Bonastre et al. by incorporating the teaching of 
King in order to provide improve speech recognition accuracy. 

8. Regarding claim 42, Bonastre et al. disclose an apparatus according to claim 39, 
further comprising a monitor which continuously monitors a real-time continuous audio 
stream and performing the steps of: digitizing the continuous audio stream (ADC is 
inherently included in a digital system); detecting a speaker change in the digitized 
audio stream (sections 2.1-2.2 on page 1178 and referring to abstract section); 
performing a speaker recognition if a speaker change is detected (section 3 on page 
1179). Bonastre et al. fail to disclose the step of transcribing at least part of the 
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continuous audio stream if a predetermined speaker is recognized. However, Glickman 
et al. teach the step of transcribing at least part of the continuous audio stream if the 
known speaker is recognized (col. 5, In. 30-67). 

Since Bonastre et al. and Glickman et al. are analogous art because they are 
from the same field of endeavors, it would have been obvious to one of ordinary skill in 
the art at the time of invention to modify Bonastre et al. by incorporating the teaching of 
Glickman et al. in order to provide automatic closed-caption using speaker-dependent 
models to enhance speech recognition accuracy. 

9. Regarding claims 20, 26, 36-37, and 44-45, Bonastre et al. fail to disclose a 
method, apparatus and computer readable medium according to claims 19, 25, 31, and 
39, comprising the further step of protocolling time information for detected speaker 
changes. However, Glickman et al. further teach the step of protocolling time 
information for detected speaker changes (timing info 332 in figure 3). 

Since Bonastre et al. and Glickman et al. are analogous art because they are 
from the same field of endeavors, it would have been obvious to one of ordinary skill in 
the art at the time of invention to further modify Bonastre et al. by incorporating the 
teaching of Glickman et al. in order to improve alignment of audio segments with 
corresponding transcribed text segments. 

1 0. Regarding claims 22-23, 28-29, 38, and 46, Bonastre et al. further to disclose a 
method, apparatus, and computer readable medium according to claims 19, 25, 31, and 
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39, wherein the step of detecting a speaker change is accomplished by use of at least 
one characteristic audio feature, in particular features derived from the spectrum of the 
audio signal (see figure 2, parameter extraction and feature vector of speech signal); 
and wherein the step of performing a speaker recognition involves the particular steps 
of calculating a speaker signature from the audio stream and comparing the calculated 
speaker signature with at least one known speaker signature (see figure 2, parameter 
extraction and feature vector of speech signal. Audio characteristics or speech 
features/parameters are signature of the target speaker). 

1 1 . Regarding claims 24 and 30, Bonastre et al. fail to disclose a method and 
apparatus according to claims 19 and 25, for use in a speech recognition or voice 
control system comprising at least two speaker-specific speaker models and/or 
dictionaries, wherein interchanging between the at least two speaker-specific 
dictionaries dependent on the detected speaker change and the corresponding 
recognized speaker. However, Glickman et al. further teach a speech recognition or 
voice control system comprising at least two speaker-specific speaker models and/or 
dictionaries (col. 5, lines 43-62). 

Since Bonastre et al. and Glickman et al. are analogous art because they are 
from the same field of endeavors, it would have been obvious to one of ordinary skill in 
the art at the time of invention to further modify Bonastre et al. by incorporating the 
teaching of Glickman et al. in order to improve speech recognition accuracy. 
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The modified Bonastre et al. fail to specifically disclose interchanging between 
the at least two speaker-specific dictionaries dependent on the detected speaker 
change and the corresponding recognized speaker. However, King further teaches 
interchanging between the at least two speaker-specific dictionaries dependent on the 
detected speaker change and the corresponding recognized speaker (col. 5, lines 26-47 
and col. 6, lines 35-67, user specific files are retrieve to process the user's input 
speech). 

Since the modified Bonastre et al. and King are analogous art because they are 
from the same field of endeavors, it would have been obvious to one of ordinary skill in 
the art at the time of invention to modify Bonastre et al. by incorporating the teaching of 
King in order to provide improve speech recognition accuracy. 

12. Regarding claims 33 and 41 , Bonastre et al. fail to specifically disclose an 
apparatus according to claims 31 and 39, further comprising a scanner which 
automatically scans a continuous audio record, in particular a continuous audio stream 
recorded on a data or a signal carrier, and for detecting speaker changes in the 
continuous audio record. However, Glickman et al. further inherently teach such a 
scanner (col. 2, lines 23-37, audio and text data are stored as two files, and files are 
stored in conventional disks or memory). 

Since Bonastre et al. and Glickman et al. are analogous art because they are 
from the same field of endeavors, it would have been obvious to one of ordinary skill in 
the art at the time of invention to further modify Bonastre et al. by incorporating the 
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teaching of Glickman et al. in order to enable the system to perform speaker change 
detection and recognition on any source of audio data. 

1 3. Claims 21 , 27, 32, and 40 are rejected under 35 U.S.C. 1 03(a) as being 
unpatentable over Bonastre et al. (IEEE Publication) in view of Glickman et al. (US 
6067059), in view of King (US 6532446), and further in view of Kimber et al. (US 
5598507). 

14. Regarding claims 21, 27, 32, and 40, the modified Bonastre et al. fail to disclose 
a method, apparatus, and computer readable medium according to claims 19, 25, 31, 
and 39, wherein the step of detecting a speaker change and/or the step of performing a 
speaker recognition is/are preceded by the further step of detecting non-speech 
boundaries between continuous speech segments. However, Kimber et al. further 
teach wherein the step of detecting a speaker change and/or the step of performing a 
speaker recognition is/are preceded by the further step of detecting non-speech 
boundaries between continuous speech segments (col. 12, In. 1-10, specifically 
elements 212 or 216 in figure 12). 

Since the modified Bonastre et al. and Kimber et al. are analogous art because 
they are from the same field of endeavors, it would have been obvious to one of 
ordinary skill in the art at the time of invention to further modify Bonastre et al. by 
incorporating the teaching of Kimber et al. in order to improve speech recognition 
accuracy. 
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Conclusion 

The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. Ortega et al. (US 6332122) is considered pertinent to the 
claimed invention. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to HUYEN X. VO whose telephone number is (571)272- 
7631 . The examiner can normally be reached on M-F, 9-5:30. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Patrick Edouard can be reached on 571-272-7603. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

/HuyenXVo/ 3/10/2008 
Primary Examiner, Art Unit 2626 
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